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Abstract. A Bayesian approach is developed to determine quantum mechan- 
ical potentials from empirical data. Bayesian methods, combining empirical 
measurements and a priori information, provide flexible tools for such em- 
pirical learning problems. The paper presents the basic theory, concentrating 
in particular on measurements of particle coordinates in quantum mechanical 
systems at finite temperature. The computational feasibility of the approach 
is demonstrated by numerical case studies. Finally, it is shown how the ap- 
proach can be generalized to such many-body and few-body systems for which 
a mean field description is appropriate. This is done by means of a Bayesian 
inverse Hartree-Fock approximation. 
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1 Introduction 

The problem addressed in this paper is the reconstruction of Hamiltonians of 
quantum systems from observational data. Finding such "causes" or "laws" from 
a finite number of observations constitutes an inverse problem and is typically 
ill-posed in the sense of Hadamard [1-8]. 

Two research fields deal in particular with the reconstruction of potentials 
from spectral data (energy measurements): inverse spectral theory and inverse 
scattering theory. Inverse spectral theory characterizes the kind of data necessary, 
in addition to a given spectrum, to determine the potential [7, 9-13]. (See also 



Sect. 3.2.1.) Inverse scattering theory, in particular, considers, in addition to the 
spectrum, boundary data obtained 'far away' from the scatterer. Those can be, 
for example, phase shifts obtained from scattering experiments 14, 15 1. 



In this paper, contrasting those two approaches, we will not exclusively be 
interested in spectral data, but will develop a formalism which allows to extract 
information from quite heterogeneous empirical data. In particular, we will con- 
sider in more detail the situation where the position of a quantum mechanical 
particle has been measured a finite number of times. 

Due to increasing computational resources, the last decade has also seen a 
rapidly growing interest in applied empirical learning problems. They appear as 
density estimation, regression or classification problems and include, just to name 
a few, image reconstruction, speech recognition, time series prediction, and object 
recognition. Many disciplines, like applied statistics, artificial intelligence, com- 
putational and statistical learning theory, statistical physics, and also psychology 
and biology, contributed in developing a variety of learning algorithms, including 
for example smoothing splines |l6j , regularization and kernel approaches j| , sup- 
port vector machines |17|, [Tq] , generalized additive models |l9|], projection pursuit 



regression |2(J, expert systems and decision trees [21|, neural networks [22|, and 
graphical models [23]. 



Recently, their has been much work devoted to the comparison and unification 
of methods arising from different disciplines. (For an overview and comparison 
of methods see for example [p4f].) Hereby, especially the Bayesian approach to 
statistics proved to be useful as a unifying framework for empirical learning [22, 
25-36]. Bayesian approaches put special emphasis on a priori information which 
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always has to accompany empirical data to allow successful learning. 

The present paper is written from a Bayesian perspective. In particular, a 
priori information will be implemented in form of stochastic processes [37]. Com- 
pared to parametric techniques this has the advantage, that a priori information 
can typically be controlled more explicitly. Technically, this approach is intimately 
related to the well known Tikhonov regularization ||, ||. For an outline of the 
basic principles see also [p8| . 

The paper is organized as follows: Sect. [2] gives a short introduction to 
Bayesian statistics. Sect. ^ applies the Bayesian approach to quantum mechanics 
and quantum statistics, with Sect. 3.1 concentrating on the treatment of empirical 
data for quantum systems and Sect. 3^2 discussing the implementation of a priori 
information. Sect. [3.3| presents two numerical case studies, the first dealing with 
the approximation of approximately periodic potentials, the second with inverse 
two-body problems. Sect. |] shows how the approach can be applied to many- 
body systems, including the fundamentals of an inverse version of Hartree-Fock 
theory. Finally, Sect. || concludes the paper. 



2 The Bayesian approach 

2.1 Basic notations 

A Bayesian approach is based upon two main ingredients: 

1. A model of Nature, i.e., a space 7i of hypotheses h defined by their likeli- 
hood functions p(x\c, h). Likelihood functions specify the probability density 
for producing outcome x (measured value or dependent visible variable, 
assumed to be directly observable) under hypothesis h (possible state of 
Nature or hidden variable, assumed to be not directly observable) and con- 
dition c (measurement device parameters or independent visible variable, 
assumed to be directly observable). 

2. A prior density po(h) = p(h\Do) defined over the space H of hypotheses, 
Dq denoting collectively all available a priori information. 

Now assume (new) training data Dt = (xt, ct) = {(xi, < i < n } become 
available, consisting of pairs of measured values Xi under known conditions Cj (and 
unknown h). Then Bayes' theorem 

p(x T \c T ,h)po(h) 

P{h\D) = j r , 1 

Po{xt\ct) 

is used to update the prior density po(h) = p(h\Do) to get the (new) posterior 
density p{h\D) = p(h\DT, Dq). Here we wrote D = (Dt,Dq) to denote both, 
training data and a priori information. Assuming i.i.d. training data Dt the like- 
lihoods factorize p(xT\cT,h) = YYl p(xi\ci, h). Note that the denominator which 
appears in Eq. ([[]), i.e., Po(xt\ct) = / dh p(xt\ct , h) Po{h) , is /i-independent. It 
plays the role of a normalization factor, also known as evidence. Thus, the terms 
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in Eq. ([!]) are named as follows, 

likelihood x prior 

posterior = . (2) 

evidence 

To make predictions, a Bayesian approach aims at calculating the predictive 
density 

p{x\c,D) = J dhp(x\c,h) p(h\D), (3) 

which is a likelihood average weighted by their posterior probability. The fa- 
integral can be extremely high dimensional, and often, like in the case we are 
focusing on here, even be a functional integral [39, [5]] over a space of likelihood 



functions 7i. In as far as an analytical integration is not possible, one has to treat 
the integral, for example, by Monte Carlo methods [30, 41-44] or in saddle point 
approximation |3^, ^5], [jq| . Assuming the likelihood term p(x\c, h) to be slowly 
varying at the stationary point the latter is also known as maximum posterior 
approximation. In this approximation the /i-integration is effectively replaced by 
a maximization of the posterior, meaning the predictive density is approximated 
by 

p(x\c,D)^p(x\c,h*), (4) 

where 

h* = axgmax heH p(h\D) = argmin fteW [- hxp(h\D)]. (5) 

The term — \np(h\D) is also often referred to as (regularized) error functional 
and indeed, a maximum posterior approximation is technically equivalent to min- 
imizing error functionals with Tikhonov regularization [2-4, 47]. The difference 
between the Bayesian approach and the classical Tikhonov regularization is the 
interpretation of the extra term as costs or as a priori information, respectively. 

Within a maximum likelihood approach an optimal hypothesis h is obtained 
by maximizing only its training likelihood p(xt\ c Ti h) instead of its complete pos- 
terior. This is equivalent to a maximum posterior approximation with uniform 
prior density. A maximum likelihood approach can be used for hypotheses h = 
h(£), parameterized by (vectors) £. A maximum likelihood approach is possible 
if that parameterization is restrictive enough (and well enough adapted to the 
problem), so no additional prior is required to allow generalization from training 
to non-training data. For completely flexible nonparametric approaches, however, 
the prior term is necessary to provide the necessary information linking training 
data and (future) non-training data. Indeed, if every number p(x\c, h) is consid- 
ered as a single degree of freedom [restricted only by the positivity constraint 
p(x\c, h) > and the normalization over x] then, without a priori information, 
training data contain no information about non-training data. 



2.2 An Example: Regression 



Before applying the Bayesian framework to quantum theory, we shortly present 
one of its standard applications: the case of (Gaussian) regression. (For more 
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details see for example p4[.) This also provides an example for the relation be- 
tween the Bayesian maximum posterior approximation and the minimization of 
regularized error functionals. 

A regression model is a model with Gaussian likelihoods, 

1 (s-fc(c)) 2 

p(x\c,h) = e ^ , (6) 

V2"7TC7 



with fixed variance a 2 . The function h(c) is known as regression function. (In 
regression, one often writes y for the dependent variable, which is x in our no- 
tation, and x for the "condition" c.) Our aim is to determine an approximation 
for h(c) using observational data D = {(xj,Cj)|l <i< n}. Within a parametric 
approach one searches for an optimal approximation in a space of parameterized 
regression functions /i(c;£). For example, in the simple cases of a constant or a 
linear regression such a parameterization would be /i(c;£) = £ or /i(c;£i,£2) = 
+ £ii respectively. If the parameterization is restrictive enough then a prior 
term po(h) is not needed and maximizing the likelihood over all data D is thus 
equivalent to minimizing the squared error, 

E sq (0 = ^2(x l -h(c i ;0) • (7) 

i 

There are, however, also very flexible parametric approaches, which usually do 
require additional a priori information. An example of such a nonlinear one- 
parameter family has been given by Vapnik and is shown in Fig. [I]. Without 
additional a priori information, which may for example restrict the number of 
oscillations, such functions can in most cases not be expected to lead to useful 
predictions. Nonparametric approaches, which treat the numbers h(c) as single 
degrees of freedom, are even more flexible and do always require a prior po(h). 
For nonparametric approaches such a prior can be formulated in terms of the 
function values h(c). A technically very convenient choice is a Gaussian process 
prior H|, H|, 

Po(h) = (det ^Ko) 2 e -t<ft-fto|Ko|fc-ftc> (8) 

with mean ho, representing a reference or template for the regression function h, 
and inverse covariance AKo given by a real symmetric, positive (semi-) definite 
operator scaled by A and acting on functions h. The operator Ko defines the 
scalar product, 

<h-ho\K Q \h-h >= j dcdc [h(c) - h (c)]K (c,c')[h(c') - h (c)}. (9) 

Typical priors enforce the regression function h to be smooth. Such smoothness 
priors are implemented by choosing differential operators for Ko- For example, 
taking for Ko the negative Laplacian and choosing a zero mean ho = 0, yields 

< h- h | K | h - h >=- Jdc h{c) ^h(c) = Jdc (^^) , (10) 
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Figure 1. Examples of parametric regression functions with increasing flexibility (with 3 data 
points). L.h.s: A fitted constant h(c) = £. Middle: A linear h(c) — $2C + £i. R.h.s: The function 
h(c) = sin(£c) can fit an arbitrary number of data points (with different a and \xi\ < 1) 
well p?| . Additional a priori information becomes especially important for flexible approaches. 

where we integrated by parts assuming vanishing boundary terms. In statistics 
one often prefers inverse covariance operators with higher derivatives to obtain 
smoother regression functions [16, 50-55]. An example of such a prior with higher 
derivatives is a "Radial Basis Functions" prior with the pseudo-differential oper- 
ator Ko = exp (— cr^gpA/2) as inverse covariance. 

Maximizing the predictive density p(x\c,D) for a Gaussian prior (g) is equiv- 
alent to minimizing the regularized error functional 

1 n ^ A' 

Ercg(h) = - (a* - h(ci)j + — <h- h \K \h- h > . (11) 

i 

The "regularization" parameter A' = Act 2 , representing a so called hyperparam- 
eter, controls the balance between empirical data and a priori information. In a 
Bayesian framework one can include a hyperprior p(A) and either integrate over 
A or determine an optimal A in maximum posterior approximation p6fl . Al- 
ternative ways to determine A are crossvalidation techniques |i~6|], the discrepancy 
and the self-consistent method f56]| . For example in the case of a smoothness 
prior, a larger A' will result in a smoother regression function h* . It is typical for 
the case of regression that the regularized error functional E reg (h) is quadratic 
in h. It is therefore easily minimized by setting the functional derivative with 
respect to h to zero, i.e., 5E reg /5h = 5hE reg = 0. This stationarity equation is 
then linear in h and thus has a unique solution h* . (This is equivalent to so called 
kernel methods with kernel Kq . It is specific for regression with Gaussian prior 
that, given Kq , only an n-dimensional equation has to be solved to obtain h*. ) 
As the resulting maximum posterior solution p(x\c, h*) is Gaussian by definition, 
we find for its mean 

Jdxxp(x\c,h*) = h*(c). (12) 

It is not difficult to check that, for regression with a Gaussian prior, h*(c) is also 
equal to the mean J dx xp(x\c, D) of the exact predictive density @). Furthermore 
it can be shown that, in order to minimize the squared error [x — h(c)] 2 for (future) 
test data, it is optimal to predict outcome x = h(c) for situation c. 
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In the following sections we will apply the Bayesian formalism to quantum the- 
ory. Hence, training data X{ will represent the results of measurements on quan- 
tum systems and conditions Cj will describe the kind of measurements performed. 
Being interested in the determination of quantum potentials our hypotheses h will 
in the following represent potentials v. 

3 Inverse quantum statistics 

3.1 The likelihood model of quantum theory 
3.1.1 Measurements in quantum theory 

The state of a quantum mechanical system is characterized by its density operator 
p. In particular, the probability of measuring value x for observable O in a state 
described by p is known to be |57], [58| 



This defines the likelihood model of quantum theory. The observable O, repre- 
sented by a hermitian operator, corresponds to the condition c of the previous sec- 
tion. The projector Po(%) = Yli \ x i>< x i \ projects on the space of eigenfunctions 
| X[ > of O with eigenvalue x, i.e., for which 0\ x\ > = x\ xi >. For non-degenerate 
eigenvalues Po(x) = \x><x\. 

To be specific, we will consider the measurement of particle positions, i.e., 
the case O = x with x being the multiplication operator in coordinate space. 
However, the formalism we will develop does not depend on the particular kind 
of measured observable. It would be possible, for example, to mix measurements 
of position and momentum (see, for example, Section |3.2.5| ). 

For the sake of simplicity, we will assume that no classical noise is added by the 
measurement process. It is straightforward, however, to include a classical noise 
factor in the likelihood function. If, for example, the classical noise is, conditioned 
on Xi , independent of quantum system then 



where we denoted the 'true' coordinates by X{ and the corresponding noisy output 
by xi. A simple model for p(xi\xi) could be a Gaussian. 

In contrast to the (ideal) measurement of a classical system, the measurement 
of a quantum system changes the state of the system. In particular, the measure- 
ment process acts as projection Po(x) to the space of eigenfunctions of operator 
O with eigenvalues consistent with the measurement result. Thus, performing 
multiple measurements under the assumption of a constant density operator p 
requires special care to ensure the correct preparation of the quantum system be- 
fore each measurement. In particular, considering a quantum statistical system 
at finite temperature, as we will do in the the next section, the time between two 
consecutive measurements should be large enough to allow thermalization of the 
system. 




(13) 




(14) 
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3.1.2 Likelihood in the canonical ensemble 

From now on we will consider a quantum mechanical canonical ensemble at tem- 
perature 1/(3. Such a system is described by a density operator 

e -fSH 

H denoting the Hamiltonian of the system. Specifically, we will focus on repeated 
measurements of a single particle in a heat bath of temperature 1/(3 with sufficient 
time between measurements to allow the heat bath to reestablish the canonical 
density operator. For (non-degenerated) particle coordinates xi the likelihood for 
p becomes the thermal expectation 

p(xi\x,p) = Tr (P £ ( Xi )p) = Y,Pa\Mxi)\ 2 =< \Hxi)\ 2 >, (16) 

a 

< • • • > denoting the thermal expectation with probabilities 

P«= e —, Z = ^e~^, (17) 

a 

and energies and orthonormalized eigenstates 

H\(f> a > = E a \<p a >. (18) 

In particular, we will consider a hermitian Hamiltonian of the standard form 
H = T + V, with kinetic energy T, being l/(2m) times the negative Laplacian 
— A for a particle with mass m (setting h = 1), and local potential V(x,x') = 
v{x)5{x — x'). Thus, in one dimension 

H(x, x>) = 6(x - x <)(-±-^- 2+ „(*)) , (19) 

where the 5-functional is usually skipped. 

For n independent position measurements X{ the likelihood for p(v), and thus 
for v, becomes (writing now p(xi\x, p) = p(xi\x,v)) 

n n 

p(x T \x,v) = Y[p{ Xi \x,v) = n < l^(^)| 2 > • (20) 
i=i i=i 

We remark that it is straightforward to allow (3 to vary between measurements. 

3.1.3 Maximum likelihood approximation 

A maximum likelihood approach selects the potential v with maximal likelihood 
p(xt\x, v) under the training data. Beginning with a discussion of the parametric 
approach we consider a potential v(^,x) parameterized by a parameter vector £ 
with components £/. To find the parameter vector which maximizes the training 
likelihood we have to solve the stationarity equation 



= 8^p(xt\x, v ), 



(21) 
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<9g = d/d£ denoting the gradient operator with components d^ r Obtaining from 
Eq. © ' 

dtp(xi\x,v) = <{d^*{xi))4>{xi)> + <(j ) *{x i ){d^{xi))> (22) 
- p (< \(t>(xi)\%E >-< \<P{xi)\ 2 X d^E >) , 

we see that to solve Eq. (|2l|) we have to calculate the derivatives of the eigen- 
values d^E a and of the eigenfunctions at the data points d^(p a (xi). Those are 
implicitly defined by the eigenvalue equation for H = H(v). To proceed we take 
the derivative of the eigenvalue equation ( |l8|) 

(%ff) \</) a > + H\ d^a > = (d^E a ) \(f> a > + E a \ 8^ a >. (23) 

Projecting onto <4> a | yields, using d^H = d^v and the hermitian conjugate of 



Eq. (18) we arrive at 



*E a = ^j^ , (24) 

< <Pa | Pa > 

{E a -H)\8^ a > = (d^v-d^E a )\ 9a >. (25) 

Because all orbitals with energy E a (which may be more than one if E a is de- 
generated) are in the null space of the operator (E a — H), Eq. (|25|) alone does 
not determine d^(p a uniquely. We also notice, that because the left hand side of 
Eq. (|25| ) vanishes if projected on a eigenfunction c/> 7 with EL = E a we find for 
degenerate eigenfunctions < 7 | d^H | (f> a > = 0, if we choose < <fi a | <^> 7 > = 5 aa . 
A unique solution for Oc<p a can be obtained be setting < </> 7 | d^(f) a > = for eigen- 
functions 7 with Eq, = 2? 7 . This corresponds to fixing normalization and phase 
of eigenfunctions and, in case of degenerate eigenvalues, uses the freedom to work 
with arbitrary, orthonormal linear combinations of the corresponding eigenfunc- 
tions. Because the operator (E a — H) is invertible in the space spanned by all 
eigenfunctions (p^ with different energy E~ ^ E a , this yields, using orthonormal 
eigenfunctions, 

>= Yl F 1 F I ^7 >< ^7 I d H v I <!><* > ■ ( 26 ) 

For nondegenerated energies the sum becomes The stationarity equation 

(21) can now be solved iteratively by starting from an initial guess v° for v, 
calculating E a (v) and <j> a (y) to obtain d^E a and d^(j) a (xi) from Eqs. (24,25]) and 



thus d^p(xi\x,v) from Eq. (|22j). Then a new guess for v is calculated (switching 
to log-likelihoods) 



v new = v old + r ] A- 1 ^2d i lnp(x i \x,v old ), (27) 

i 

with some step width ij and some positive definite operator A (approximating for 
example the Hessian of 1hp(xt\x, v)). This procedure is now iterated till conver- 
gence. 
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While a parametric approach restricts the space of possible potentials v, a 
nonparametric approach treats each function value v(x) itself as individual degree 
of freedom, not restricting the space of potentials. The corresponding nonpara- 
metric stationarity equation is obtained analogous to the parametric stationarity 
equation ( |2~I|) replacing partial derivatives with the functional derivative op- 
erator S v = 5/5v with components 5 v r x ) =5/ (5v(x)) [|59]]. Because the functional 
derivative of H is simply 

K{x)H{x\x") = S v ^V(x',x") = 5(x - x')5(x - x"), (28) 

we get, using the same arguments leading to Eq. (p6|) 



r <<t>a\5 v (x)H\<t>a> . , , Nl2 , „s 

S v (x)E a = ~T|~7 ^ = Mx) , 29 

< Pa | 4>a > 
1 

<£u(aO0a(a/) = E -E ^(^O^O^^O*^ ( 30 ) 



and therefore 

^(a!)P(a:i|«,v) = < (^(a;)0*(a5i)) <K#i) > + < (zi)8 v ( x )<l>(xi) > (31) 

- /? (< ^(xi)! 2 !^)! 2 > - < |<M^)I 2 >< |#r)| 2 >) . 

(The partial derivative with respect to parameters £ and the functional deriva- 
tive with respect to v(x) are related according to the chain rule d^p(xi\x,v) = 
Jdx (d^v(x)) 5 v r x \p(xi\x,v) = v^5 v p(xi\x,v) with operator v^(l,x) = d^v(^,x).) 

The large flexibility of the nonparametric approach allows an optimal adap- 
tion of v to the available training data. However, as it is well known in the context 
of learning it is the same flexibility which makes a satisfactory generalization to 
non-training data (e.g., in the future) impossible, leading, for example, to 'patho- 
logical', 5-functional like solutions. Nonparametric approaches require therefore 
additional restrictions in form of a priori information. In the next section we will 
include a priori information in form of stochastic processes, similarly to Bayesian 
statistics with Gaussian processes [16, 37, 44, 48, 60-63] or to classical regular- 
ization theory ]2|, [|, 16 1. In particular, a priori information will be implemented 



explicitly, by which we mean it will be expressed directly in terms of the function 
values v(x) itself. This is a great advantage over parametric methods where a pri- 
ori information is implicit in the chosen parameterization, thus typically difficult 
or impossible to analyze and not easily adapted to the situation under study. In- 
deed, because it is only a priori knowledge which relates training to non-training 
data, its explicit control is essential for any successful learning. 

3.2 Prior models for potentials 

3.2.1 The need for a priori information 

Typical results of inverse spectral theory show that, for example, a one- 
dimensional local potential can be reconstructed if a set of two complete spectra 
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{Ea^}f^, {Ea^}f^ is given for two different boundary conditions for (j) a |/], [T^ ], 
Alternatively, a single spectrum is sufficient, if either a complete set of norming 
constants u a = J 4> a dx is given (for a certain normalization of 4> a which fixes the 
values of (f) a on the boundary) || or the potential is already known on half of the 
interval [64]. Results from inverse scattering theory show under which circum- 



stances a potential can be reconstructed from, e.g., a complete spectrum and the 



phase shifts as function of energy |12| , 14, 15]. In practice, however, the number 



of actual measurements can only be finite. Thus, even if noiseless measurement 
devices would be available, an empirical determination of a complete spectrum, 
or of phase shifts as function of energy, is impossible. Therefore, to reconstruct 
a potential from experimental data in practice, inverse spectral or inverse scat- 
tering theory has to be combined with additional a priori information. If such a 
priori information is not made explicit — as we try to do in the following — it 
nevertheless enters any algorithm at least implicitly. 

We address in this paper the measurement of arbitrary quantum mechanical 
observables, not restricted to spectral or scattering data. In particular, we have 
considered the measurement of particle positions. However, measuring particle 
positions only can usually not determine a quantum mechanical potential com- 
pletely. For example, consider the ideal case of an infinite data limit n — * oo for 
a discrete x variable (so derivatives with respect to x have to be understood as 
differences) at zero temperature (i.e., — > oo). This, at least, would allow to 
obtain p(x\x, v) = |0o(x)| 2 to any desired precision. But even when we restrict to 
the case of a local potential, we would also need, for example, the ground state 
energy Eq and <Pq(x)(I)q(x) to determine v(x) from the eigenvalue equation of H 

v(x)-E + — lMx)]2 , (32) 

where (j)'o = d 2 (j) a (x) / dx 2 (or a discretized version thereof). For finite data, a 
nonlocal potential, continuous x, or finite temperature the situation is obviously 
even worse. In the high temperature limit, for example, p(x\x,v) becomes uni- 
form and independent from the potential. Summarizing, even in the ideal case 
where the complete true likelihood p(x\x,v) is assumed to be known, the prob- 
lem of reconstructing potential can still be ill-posed. (The corresponding time- 
dependent problem, i.e., the reconstruction of a potential v given the complete 



time-dependent likelihood, is treated in [65]. A Bayesian approach for time 



dependent systems, based on finite data, can be found in |66|].) Hence, while 



a priori information is crucial for every learning problem [32, ^3|, |67| ], the re- 
construction of a quantum mechanical potential is particularly sensitive to the 
implemented a priori information. 



3.2.2 Gaussian processes and smooth potentials 

In this section we include, in addition to the likelihood terms, a priori informa- 
tion in form of a prior density po{v). Having specified po(y) a Bayesian approach 
aims at calculating the predictive density (|3|). Within a maximum posterior ap- 
proximation the functional integral in Eq. (y) can be calculated by Monte Carlo 
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methods or, as we will do in the following, in saddle point approximation, i.e., 
by selecting the potential with maximal posterior. The posterior density of v is 
according to Eq. ([[]) proportional to the product of training likelihood and prior 

p{v\D)^ PQ {v)\{<\4>{x i )\ 2 > . (33) 

i 

Hence, the maximum likelihood approximation we have discussed in the last sec- 
tion is equivalent to a maximum posterior approximation under the assumption 
of a uniform prior. 

Technically the most convenient priors are Gaussian processes which we al- 
ready have introduced in (|8|) for regression models. Such priors read for v, 

i 

Po (v) = (det ^K ) 2 e -^-vo\K \v-v 0>j (34) 

with mean vq, representing a reference potential or template for v, and real sym- 
metric, positive (semi-) definite covariance operator (1/A)Kq , acting on poten- 
tials v and not on wave functions <p a . The operator Ko defines a scalar product 
and thus a distance measuring the deviation of v from vq. The most common 
priors are smoothness priors where Ko is taken as differential operator. (In that 
case Ko defines a Sobolev distance.) Examples of smoothness related inverse prior 
covariances are the negative Laplacian Ko = — A, which we have already met in 
Eq. (|io|), or operators with higher derivatives like a "Radial Basis Functions" 
prior with pseudo-differential operator Ko = exp (— o"^ BF A/2). 

Finally, we want to mention that also the prior density can be parameterized, 
making it more flexible. Parameters of the prior density, also known as hyperpa- 
rameters, are in a Bayesian framework included as integration variables in Eq. (H) , 
or, in maximum posterior approximation, in the maximization of Eq. (|||) |2^, |2q 1. 
Hyperparameters allow to transform the point-like maxima of Gaussian priors 
to submanifolds of optimal solutions. For a Gaussian process prior, for example, 
the mean or reference potential vq and the covariance Kq 1 /X can so be adapted 
to the data |^2| . 



3.2.3 Approximate symmetries 

To be more general let us consider a priori information related to some approx- 
imate symmetry [S7|. In contrast to an exact symmetry where it is sufficient to 



restrict v to be symmetric, approximate symmetries require the definition of a 
distance measuring the deviation from exact symmetry. In particular, consider a 
unitary symmetry operation 5, i.e., = S r-1 3 denoting the hermitian conju- 
gate of S. Further, define an operator S acting on (local or nonlocal) potentials 
V, by SV = S^VS. In case of an exact symmetry V commutes with 5, i.e., [V, S] 
= and thus SV = S^VS = V. In case of an approximate symmetry we may 
choose a prior 

p (V)^e- Es , (35) 
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with 'symmetry energy' or 'symmetry error' 

Es = \ <V - SV\K S \V -SV>= - <V \K \V>, (36) 

some positive (semi-) definite Kg, hence positive semi-definite K = (I — 
S)'K,s<(I — S), I denoting the identity. (Symmetric V are within the Null space 
of Kg.) If S belongs to a Lie group it can be expressed by a Lie group parameter 
6 and the generator s of the corresponding infinitesimal symmetry operation as 
S(9) = exp(#s). Hence, we can define an error with respect to the infinitesimal 
operation s with, say, Kg = I, 

Es = ^l < ^^\¥^pv > = l < V\sh\V>. (37) 

Choosing, for instance, s as the derivative operator (for vanishing or periodic 
boundary terms) results in the typical Laplacian smoothness prior which measures 
the degree of symmetry of v under infinitesimal translations. 

Another possibility to implement approximate symmetries is given by a prior 
with symmetric reference potential Vs = SVs 

E Vs = l -<V -V S \V -V s >. (38) 

In contrast to Eq. ( |36"| ) which is minimized by any symmetric V, this term is 
minimized only by V = Vs- Note, that also in Eq. ( p6| ) an explicit non-zero 
reference potential Vo can be included, meaning that not V but the difference 
V — Vq is expected to be approximately symmetric. 

Finally, a certain deviation n from exact symmetry might even be expected. 
This can be implemented by including 'generalized data terms' [32] 



E S , K = \(E S (V) - k) 2 = \(\<V- $V | V - SV> - k) 2 , (39) 
similar to the usual mean squared error terms used in regression. 

3.2.4 Mixtures of Gaussian process priors 

Stochastic process priors have, compared to priors over parameters £, the advan- 
tage of implementing a priori knowledge explicitly in terms of the function values 
v(x). Gaussian processes, in particular, always correspond to simple quadratic er- 
ror surfaces, i.e., concave densities. Being technically very convenient, this is, on 
the other hand, a strong restriction. Arbitrary prior processes, however, can easily 
be built by using mixtures of Gaussian processes without loosing the advantage 



of an explicit prior implementation [32, 33, 63, |67j. (We want to point out that 



using a mixture of Gaussian process priors does not restrict v to a mixture of 
Gaussians.) 

A mixture of M Gaussian processes with component means vt and inverse 
component covariances reads 

M M M ,ys 

Po(v) = Y.P( v > k ) = Y,P^Po( v \ k ) = ^^4-t<^l K *l*-**> (40) 
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with Zk = (det ^-K&) 2 and mixture probabilities p(k). The parameter A plays 
the role of an inverse mixture temperature. Analogous to annealing techniques 



changing A allows to control the degree of concavity of the mixture [33, p 



3.2.5 Average energy 

Using a standard Gaussian smoothness prior as in Eq. ( |34| ) with zero reference 
potential vq = (and, say, zero boundary conditions for v) leads to flat potentials 
for large smoothness factors A. Especially in such cases it turned out to be useful 
to include besides smoothness also a priori information which determines the 
depth of the potential. One such possibility is to include information about the 
average energy 

U = Y,PaE a =<E> . (41) 

a 

We may remark, that for fixed boundary values of v a certain average energy 
cannot be obtained by simply adding a constant to the potential. The average 
energy can, however, be set to a value k by including a Lagrange multiplier term 

Eu = v{U-k), (42) 

Similarly, and technically sometimes easier, one can include noisy 'energy data' 
of the form 

P„«eA Eu = ^{U-k)\ (43) 
For // —* oo this results in U — » k so both approaches coincide. 

3.2.6 Maximum posterior approximation 

Let us consider prior densities being a product of a Gaussian prior pq as in Eq. 
(34), or more general a mixture of Gaussian processes as in Eq. (40), and a non- 



Gaussian energy prior p v of the form of Eq. (|43|) . In that case, the stationarity 
equation we have to solve to maximize the posterior density of Eq. (33) reads 

= K(x) lnpo(u) + $v(x) lnPu(u) + XXo) ln P( x i\%> v )- ( 44 ) 



While 5 v r x \p(xi\x,v) has already been calculated in Sect. |3.1.3| , we now need also 
$v(x)Po(v) an d S v r x \p u (v) For a Gaussian po(v) the functional derivative is easily 
found to be 

5 v \np Q = = -XK (v - v Q ). (45) 
Po 

Similarly, for a mixture of Gaussian processes it is not difficult to show that 

M 

5 v lnp = -X^2p (k\v)K k (v -v k ) (46) 



where po(k\v) = p (v, k)/p (v). 
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To get the functional derivative of the non-Gaussian pjj we calculate first 

8 v (x)U =< 5 v{x) E >-p< E5 v{x) E > +f3 <E>< 5 v(x) E > . (47) 

As 6 v ( x }E a has been found in Eq. (|29|) this yields 

5 V(X) E V = »(U-k){< |</>(*)| 2 >-0{<E |0(x)| 2 > -U < \<t>(x)\ 2 >)) . (48) 

Collecting all terms, we can now solve the stationarity equation (|4i|) by iter- 
ation 



y new 



= v old + r/A- 1 (xKo{v -v old ) + lnp( Xi \x, v old ) - 6 v E^j , (49) 

where we introduced Ko = YlkPo(k\ v ) ^fc and vq = Kg 1 ^2kPo{k\v) K^ffc and a 
step width r\ and positive definite iteration matrix A has to be selected. Choosing 
A as the identity matrix means moving in the direction of the gradient of the 
posterior. Taking for A the Hessian one obtains the Newton method. Quasi- 
Newton methods, like the DFP (Davidon-Fletcher-Powell) or BFGS (Broyden- 
Fletcher-Goldfarb-Shanno) variable metric methods, approximate the Hessian 
iteratively [68-72] (For the case of solving for continuous functions sec |7^] . ) 

A simple and useful choice in our case is A = Ko which approximates the 
Hessian. For a single Gaussian prior this choice does not depend on v and has 
thus not to be recalculated during iteration. Eq. ( p9|) then becomes 



v°*" = (i-^v^+r, U 0+ (AKo)"^ J^lnpfofoO ~ S v Eu) j . (50) 

Due to the nonparametric approach for the potential combined with a priori 
information implemented as stochastic process, the Bayesian approach formulated 
in the previous sections is clearly computationally demanding. The situation for 
inverse quantum theory is worse than, e.g., for Gaussian process priors in regres- 
sion problems (i.e., for a Gaussian likelihood, local in the regression function) 
where it is only necessary to work with matrices having a dimension equal to the 



number of training data [16, 48|. In our case, where the likelihood is nonlocal 
in the potential and also non-Gaussian prior terms may occur, the stationarity 
equation has to be solved be discretizing the problem. 

The following section demonstrates that at least one-dimensional problems 
can be solved numerically without further approximation. Higher dimensional 
problems, however, e.g., for many-body systems, require additional approxima- 
tions. In such higher dimensional situations, the potential may be parameterized 
(without skipping necessarily the prior terms) or the problem has to be divided 
in lower dimensional subproblems, e.g., by restricting v to certain (typically, ad- 
ditive or multiplicative) combinations of lower dimensional functions. (Similarly, 



for example to additive models IS] projection pursuit [^y] or neural network like 



|22| approaches.) 
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Figure 2. Reconstruction of an approximately periodic potential from empirical data. The left 
hand side shows likelihoods and the right hand side potentials: Original likelihood and potential 
"true (thin lines), approximated likelihood and potential v (thick lines), empirical density (bars), 
reference potential Vo (dashed). The parameters used are: 200 data points for a particle with 
mass m = 0.25, inverse physical temperature j3 = 4, inverse Laplacian covariance with A = 
0.2. (Average energy U(vt TU c) = —0.3539 for original potential and U(v) — —0.5521 for the 
reconstructed potential.) Notice, that the reconstructed potential v shows clearly the deviation 
from the strictly periodic reference potential vo- 

3.3 Numerical case studies in inverse quantum statistics 
3.3.1 Approximately periodic potentials 

As a first numerical application we discuss the reconstruction of a one- 
dimensional periodic potential. For example, such a potential may represent a 
one-dimensional solid surface. To be specific, assume we expect the potential to 
be periodic, or even more, to be similar to a certain periodic reference poten- 
tial vq. However, we do not want to restrict the approximation to a parametric 
form but want to keep the approximating potential flexible, so it can adapt to 
arbitrary deviations from the periodic reference potential as indicated by the 
data. For example, such deviations may be caused by impurities on an other- 
wise regular surface. Assuming the deviations from the reference to be smooth 
on a scale defined by A, these assumptions can be implemented by a Gaussian 
smoothness prior with mean vq, and, say, Laplacian inverse covariance. Including 
the likelihood terms for the empirical data, and possibly a term pu adapting the 
average energy, we end up with an error functional (negative log-posterior) to be 
minimized 

— mp(v\D) = — ^2 ^p(xi\x, v) — — <v — vq\A\v — vq> +Eu. (51) 

i 

Fig. shows representative examples of numerical results for functional fl5lD 
without energy penalty term Ejj and with a periodic reference potential (dashed 
line), vq(x) = sin(-7rx/3), on a one-dimensional grid with 30 points. Data have 
been sampled according to a likelihood function derived from a 'true' or original 
potential vtrue (shown as thin line) under periodic boundary conditions for (fi a . 
The reconstructed potential (thick line) has been obtained by minimizing Eq. (]5l| ) 
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Figure 3. Same data and parameter as for Fig. g, except for a nonzero energy penalty term 
Ejj with fi — 1000 and k — —0.3539 = U(vt rvic ). While there are, compared to Fig. El only 
slight modifications of the likelihood the average energy of the reconstructed potential, U(v) = 
—0.3532, is now nearly the same as that of the original potential. 



iterating according to Eq. (^) with A = — A A and zero boundary conditions for 
v (so A is invertible) and initial guess = vq. 

Notice, that the distortion of the underlying 'true' potential has been clearly 
identified. On the other hand the reconstructed potential coincides well with the 
periodic reference potential at locations where supported by data. 

We want to stress two phenomena which are typical for the reconstruction of 
potentials from empirical data and can also be seen in the figures. Firstly, the 
approximation of the likelihood function is usually better than the approxima- 
tion of the potential. This is due to the fact that quite different potentials can 
produce similar likelihoods. This emphasizes the relevance of a priori informa- 
tion for reconstructing potentials. Secondly, especially in low data regions, i.e., 
at high potentials, the potential is not well determined. Thus, empirical data 
mainly contribute to the approximation of regions with low potential, while a 
priori information becomes especially important in regions where the potential is 
large. More data will can be obtained for high potential regions when the tem- 
perature is increased which spreads the data over a wider area. At the same time, 
however, the likelihood becomes more uniform at large temperatures, making an 
identification of v more difficult. 

Because the reference potential vq has the same average energy U as the 
underlying original potential the results are already reasonable without energy 
penalty term Ejj. Indeed, Fig. [3| shows the relatively small influence of an addi- 
tional energy penalty term with quite large k on the likelihood function. Thus, the 
approximated probability for empirical data is not much altered. The presence 
of an Ejj term is better visible for the potential. In particular its minima fit now 
better that of the original. In the next section, where we will work with a zero 
reference potential vq = 0, the energy penalty term Ejj will be more important. 
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Figure 4. Approximation of symmetric potential. Shown are likelihoods (left hand side) and 
potentials (right hand side): Original likelihood and potential (thin lines), approximated like- 
lihood and potential (thick lines), empirical density (bars), The parameters used are: 20 data 
points for a particle with m = 0.1, truncated RBF covariances as in Eq. (^ij) with orbf = 7, 
A = 0.001, energy penalty term Ejj with fi = 20 and reference value « = —9.66 = U(vtiue) 
(average energy U(v) — —9.33 for the approximated v, ground state energy Eo(v) = —9.52) 
inverse physical temperature /3 = 1, and a potential fulfilling v(x) = v(—x) and v — at the 
boundaries. 



3.3.2 Inverse two-body problems 

As a second example we study the reconstruction of a two-body potential by 
measuring inter-particle distances x r . Consider the two-body problem 

+ 7—^- + v(xi - x 2 )5(x\ - X2 — x[ + x' 2 )5(xi +x 2 - x[ - x 2 ) (52) 



2m\ 2m 2 

with single particle momenta Pj = —id/dxi. The problem is transformed to a one- 
body problem in the relative coordinates in the usual way by introducing i.e., x r 
= x\ -x 2 , P r = (miPi - m 2 P 2 ) / (mi +m 2 ), x c = (miX\ + m 2 x 2 )/(mi +m 2 ), P c 
= P\+ P 2 , m = (mim 2 )/(mi + m 2 ), and M = mi + m 2 resulting in 

p2 \ 

+ v(x r ) 1pa(x r ) = E a 1p a (x r ). (53) 

2m J 

The total energy is additive £^* otaL (P c ) = E a + P C 2 /(2M) so the thermal proba- 
bilities p total factorize and integrating out the center of mass motion leaves p a = 
e~P Ea /Z, with E a being the eigenvalues of Eq. (|53|). 

Figs. |I| - ^ show typical results for the numerical reconstruction of a one- 
dimensional, strictly symmetric potential, fulfilling v(x) = v(—x) and set to zero 
at the boundaries. Training data have been sampled from a 'true' likelihood func- 
tion (thin lines), resulting in an empirical density P era p{x) = n(x)/n (shown by 
bars), where n(x) denotes the number of times the value x occurs in the train- 
ing data. The 'true' likelihood has been constructed from a 'true' potential (thin 
lines) choosing periodic boundary conditions for the wavefunctions. In contrast 
to Sect. 3.3. l| a zero reference potential vq = and a truncated Radial Basis 
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Figure 5. Same data and parameter as for Fig. ^ with the exception of ctrbf = 4, that means 
with a smaller smoothness constraint, and /i = 5. (U(v) = —9.46.) To allow an easier comparison 
with the reconstructed likelihood the figure shows the symmetrized empirical density P sym = 
(P cmp {x)+P cmp (~x))/2. 




Figure 6. Same data and parameter as for Fig. [5] but with even smaller smoothness constraint 
ctrbf = 1. (a* = 5, empirical density symmetrized, U(v) = —9.59.) Compared with Figs. ^ and 
^, the empirical density is better approximated but not the original potential and its likelihood 
function. 



Function (RBF) prior [32] has been used 



6 rr 2k 

E °RBF 
k\2 k 

k=0 



(-l) k A k , 



(54) 



(A fc denoting the kth iterated Laplacian) which includes, compared to a Laplacian 
prior, higher derivatives, hence producing a rounder reconstructed potential (cmp. 



Sect. 3.2.2 ). The approximated potentials (thick lines) have been obtained by 
iterating Eq. (^), including a term Ey adapting the thermal energy average to 
that of the original potential. As iteration matrix we used A = AKo together with 
an adaptive step size 77. An initial guess for the potential has been obtained by 
adding negative <5-peaks on the data points (except for data on the boundary), 
i.e., v = — &x,xi- The number of iterations necessary to obtain convergence 
has been typically between 50 and 100. 
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Figure 7. Approximation of symmetric potential with mixture of Gaussian process priors. 
The left hand side shows likelihoods and the right hand side potentials: Original likelihood and 
potential (thin lines) , approximated likelihood and potential (thick lines) , symmetrized empirical 
density (bars), and the two reference potentials v\,V2 (dashed, V2 deeper in the middle). The 
parameters used are: 20 data points for a particle with m = 0.1, inverse physical temperature (3 
— 1, Ko = —A, inverse mixture temperature A = 0.1, energy penalty factor /i = 10 for average 
energy k — —9.66 = U(vt rua ) (and U(v) — —9.55, Eo(v) = —9.82) v(x) = v(—x) symmetric, 
and v = at the boundaries. Because the data support both reference potentials ?;i and V2, the 
approximated v is in regions with no data essentially a smoothed mixture between vi and V2 
with mixture coefficients for prior components po(l\v) = 0.3, po(2\v) = 0.7. 



Comparing Figs. |] ^ one sees that a smaller smoothness leads to a better 
fit of the empirical density. A larger smoothness, on the other hand, leads to 
better fit in regions where smoothness is an adequate prior. Near the boundaries, 
however, where the original is relatively steep, a higher smoothness leads to a 
poorer approximation. A remedy would be, for example, an adapted reference 
potential vq. 

Fig. [7| presents an application of a mixture of Gaussian process priors as given 
in Eq. (f40|). Such mixture priors can in principle be used to construct an arbitrary 
prior density, adapted to the situation under study. For the numerical example a 
two component mixture has been chosen with equal component variances = 
Ko of the form of Eq. ( |54| ) and two reference potentials v i (shown as dashed lines) 
with the same average energy U. In the special situation shown in the figure both 
reference functions Vi fit similarly well to the empirical data. (The final mixture 
coefficients for v\ and V2 are po(l\v) = 0.3 and po(2\v) = 0.7.) Hence, in the no- 
data region the approximated potential v becomes a smoothed, weighted average 
of v\ and V2- Because both reference potentials coincide also relatively well with 
the original v near the boundaries, the approximation in Fig. is better than in 



In conclusion, the two one-dimensional examples show that a direct numerical 
solution of the presented Bayesian approach to inverse quantum theory can be 
feasible. 



Figs. |-| 
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3.4 Classical approximation 

Before discussing a possible approximation for many-body systems we will first 
study the classical limit of inverse quantum statistics. The classical limit is much 
easier to solve than the full quantum mechanical problem and may, for example 
for large masses, already give a useful approximation. 

The phase space density of a classical canonical ensemble is given by 

2 



-1 -/ 3 ( 2 l+-(-)) 



p(x,p c \\v) = Z e V m / , (55) 
with 

dp c \ dxe ( 2m ^ / . (56) 

Here we used p c \ to denote the classical momentum to distinguish it from a density 
p. The probability p(x\v) for measuring x [to simplify the notation we abstain 
in this context from denoting the observable O explicitly] is then obtained by 
integrating over p c \, 

p(x\v) = f d PclP (x,p cl \v) = Z~ l e~^ x \ (57) 

where 

'dxe-M'K (58) 



Notice, that the classical p(x\v) is mass independent, and, most important, that it 
can be obtained directly from v(x) without having to solve an eigenvalue problem 
like in the quantum case. 

Analogously to the quantum mechanical approach the classical likelihood 
model ( |57| ) for position measurements can now be combined with a prior model 
for potentials v, leading to a posterior density p{v\D). In particular, adding a 
Gaussian process prior the log-posterior becomes 

n A / A \ ^ 

h\p{v\D) = —(3/ v(xi) < v — vq I Ko | v — Vq > — nlnZ x — lndet I — Ko ) . 

U 2 V2vr ; 

(59) 

Again, we can refer to a maximum posterior approximation and consider the 
potential which maximizes the posterior as the solution of our reconstruction 
problem. The corresponding stationarity equation is found by setting the func- 
tional derivative of the log-posterior with respect to v(x) to zero, 

= S V \np(v\D) = -f3N - XK {v - v ) + n(3p{x\v), (60) 

where N = ^ S(x — Xi). Choosing an initial guess Eq. (|60|) can be solved by 
straightforward iteration. The results of a classical calculation (with parameters 
and data as in Fig. |||, but without energy penalty term) are shown in Fig. ||. 
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Figure 8. Classical approximation of symmetric potential. Shown are likelihoods (left hand side) 
and potentials (right hand side): Original likelihood and potential (thin lines), approximated 
likelihood and potential v c i (thick lines), empirical density (bars). The dotted line shows v c \ — c 
with constant c — min[uci] ■ miii [ftruc ]. Except for the fact that no energy penalty term has 
been used for this classical calculation the parameters and data are the same as in Fig. ^j. (20 
data points, sampled from the true quantum mechanical likelihood, truncated RBF covariances 
( |54| ) with ctrbf = 7, A = 0.001, inverse physical temperature (3 = 1, v(x) = v(—x) and v = at 
the boundaries.) 



4 Inverse many body theory 

4.1 Systems of Fermions 

In this section the Bayesian approach for inverse problems will be applied to 
many-body systems. To be specific, we will study the simultaneous measurement 
of the positions of N particles. We assume the measurement result to be given 
as a vector Xi consisting of N single particle coordinates Xjj. The treatment can 
easily be generalized to partial measurements of cc, by including an integration 
over components which have not been observed. The likelihood for v, if measuring 
a vector Xi of coordinates, becomes for a many-body system 

p(xi\x,v) = Tr(| • • • x i)N ><Xi tl , ■ ■ ■ x ijN \ pj, (61) 

which is now a thermal expectation with respect to many-body energies E a 

p(xi\x,v) = y~]p a \ip^\xi,i, ■ ■ ■ ,x itN )\ 2 =< \^ N ' (x it x, ■ ■ ■ ,x i:N )\ 2 > . (62) 

a 

In particular, we will be interested in fermions for which the wave functions ip a 
and | xn, • • • , Xi n > have to be antisymmetric. Considering a canonical ensemble, 
the density operator p has still the form of Eq. (|l5|) , but with H replaced now by a 
many-body Hamiltonian. For fermions, it is convenient to express the many-body 
Hamiltonian in second quantization, i.e., in terms of creation and annihilation 



operators 76, 77]. A Hamiltonian with one-body part T, e.g., T = —(l/2m)A, 
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and two-body potential V can so be written 

H = T + V = ^2 T ij a l a 3 + 7 Vi i kl a l a ] a i a k, (63) 



with antisymmetrized matrix elements Vyjy. Hereby, a a a\ + a\a a = <(p a \ l Pj> 
is equal to the overlap of the one-body orbitals \(p a > = <%| 0> and | y? 7 > = 
a 7 |0> which are created or destroyed by the operators a\ or a a , respectively. 
Furthermore, a} a a\ + a 7 a£ =0, a a a 7 + a 7 a a =0. A two-body eigenfunction of the 
Hamiltonian fl65|) can for example be expanded as follows 



^i 2) > = y2 c «,7 1 , > ! (64) 



a, 7 



where \(p a ,<p~> = aaa\\0> denotes a Slater determinant being an antisym- 
metrized wavefunction. 

The symmetrized version of a potential, local in relative coordinates, is 

V xlX2X ' lX ' 2 = v(\xi - x 2 \)(s(xi - x[)5(x 2 - x' 2 ) - 5(xx - x' 2 )5(x 2 - x' x )}. (65) 

Here we can always choose v(0) = 0. Now, assume we are interested in the re- 
construction of v{x) for x > 0. Solving the stationarity equation of the maximum 
posterior approximation analogous to Eq. fl44| ) of Sect. ||, the prior terms remains 
unchanged and only the likelihood terms have to be adapted. Using 

S v ( x )v(\x! - x 2 \) = 5(x - \xi - x 2 \), (66) 

we find 

&v{x)H = o ^ ~] a x 1 a xi-x a %i-x a xi + 2 y ~] a x 1 a xi+x a xi+x a xn (67) 

where x > 0, and can write, similar to the one-body case, 

<^K(x)#l^> , , 

S ^ Ea - <^ a > ' (68) 

|<W)^a> = V] F 1 F I ifryXlp-y \S v (x)H\ V J a>- (69) 



From this the functional derivatives of the likelihoods, 5 v ^p(xi\x,v), can be 
obtained. However, a direct numerical or analytical solution of the full inverse 
many-body equations is usually not feasible. To deal with this problem, a mean 
field approach will be developed in the next section. 
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4.2 Inverse Hartree-Fock theory 

To tackle the inverse many-body problem we will treat it in Hartree-Fock ap- 
proximation [74-77]. Thus, we replace the full many-body Hamiltonian H by a 
one-body Hartree-Fock Hamiltonian H HF = h^ialcii with matrix elements 
h defined, for example in coordinate representation, as 

JV 

Kx> = T XX ' + ^2 < X( Pk I V | x'cp k > , (70) 
k 

the (fik being the iV-lowest (orthonormalized) eigenstates of h. The corresponding 
eigenvalue equation 

hipk = tk<Pk, (71) 

is nonlinear, due to the (/^-dependent definition ( |70|) of h, and has to be solved 
by iteration. The Hartree-Fock ground state is given by the Slater determinant 
|^o> = det{| (fk>} made from the iV-lowest orbitals, and has energy Eq F = 
J2k tkk + \ Y,ki v kiki = Yjk e k ~ \ Hki v kiki- Considering now the case of zero 
temperature, the many-body likelihood for the true ground state ipo, 

p(xi\x,p(v)) = <ip \xi><Xi \ ipo>, (72) 

becomes in Hartree-Fock approximation 

p(Xi\x,p H F(v)) = <@0 I XiXXi | <P >. (73) 

The scalar product of the Hartree-Fock ground state <?o an d the many-body posi- 
tion eigenfunction | X{ > corresponding to the measured vector Xi is a determinant 
and can be expanded in its cofactors M/y.j 

N 

<Xi\$ > = det{<x it i | <p k >} = det B { = M M]i B kl .i, (74) 

i 

Bi being the matrix of overlaps with elements B^i-i = < x^i \<p k > = V 9 fc( x «,()- (For 
the generalization to non-hermitian h see for example |76| , f78[.) 

To maximize the posterior, we have to calculate the functional derivative of 
the Hartree-Fock likelihood with respect to the potential [79] 

8v(x)P(pi\z,PHF(v)) = <fiu( z )^o \ xi><Xi\<P > + <<P \ XiXXi | 5„( a .)# >- (75) 
Here the factors 

N N 
<Xi | 6 V ( X )$Q > = ^ M kl;i<X it l | 5 v{x )if k > = Mkl ^ A kl;i( X )> ( 76 ) 
kl kl 

can be expressed by single particle derivatives A k i-i(x) = <Xi t i\S v ( x \ip k > = 
5 v ( x Wk{ x i,l)- Analogously to Sect. ||| the functional derivatives S v / X )(p k can be 
obtained from the functional derivative of Eq. ( f7T| ) 

(8v(x)h) + h5 v ( x }ip k = (8 v ( x )e k ) ip k + £fc S v (x)<Pk- ( 77 ) 
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Projecting onto < (p k | and using the hermitian conjugate of Eq. ([TlD we find the 



Hartree-Fock version of Eqs. (68) and (|6 



<Vk | h\<fk> . n . 

o«( x )efc = i 1 (,'°J 

8 v (x) l Pk> = Y] \<Pl>«Pl\8v(x)h\tpk>, (79) 



, £fc - Q 



H^k 



where we, as done before, have fixed orthonormalization and phases by choosing 
<$v(x) L Pk | <-Pi > = for orbitals with equal energy. In contrast to Sect. ^, however, 



h, and thus 8 v r x ^h, now obey a nonlinear equation. Indeed, from Eq. (7C) it follows 

N 

S v (x)h x 'x" = ^2 ( <x (fj \6 v (x)V\x" <pj> (80) 

3 

+ < x 8 v (x) <Pj\V\ x" (fj> + < x' <Pj\V\ x" 5 v ( x j cpj>\ 



Inserting Eq. (^) and Eq. ( J80| ) into Eq. (|7y) , we obtain the inverse Hartree-Fock 
equation for 8 v r x \(p k 



i 



I N 

S v (xWk{x') = V (fi(x')Y] ( <<pi<Pj\8 v r x )V\(p k <p j > (81) 



+ < <Pi& v {x)<Pj I V I <PWj > + < <pi<Pj I V I Pk$v(x)<Pj > ) ■ 

Recalling the definition of the antisymmetric matrix elements of V we finally 
arrive at 

1 N 



I 



3 



dzip*(z)tpj(z - x)(p k (z)(pj(z - x) - <p k {z - x)<pj(zf) 

+ Jdz ip* (z)(p*(z + x)(^p k (z)(fj(z + x) - ip k (z + x)(pj(zj^J 
+ J dzdz' ipJ(z)(^5 vix) Lp*(z f )^v(\z - z'\)(p k (z)<pj(z') - ip k (z')<pj(z] 



This linear equation can be solved directly (where for Hamiltonian with real ma- 
trix elements in coordinate space the orbitals, and thus their functional deriva- 
tives, can be chosen real) or, quite effectively, by iteration, starting for example 
with initial guess 8 v i x \(pj{z') = 0. As the 8 v r x \(p k (x') , which are only required for 
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the N lowest orbitals, depend on two position variables x, x' , Eq. ( |S2| ) has essen- 
tially the dimension of a two-body equation. Having calculated S v r x \(fk(xi ! i) = 
A k i;x,i (for x > 0, 1 < k < N, 1 < I < N, 1 < i < n) from Eq. (||) the likelihood 
terms in the stationarity equation (|44| ) follow as 



det B\ 

Tir(Bf 1 + Tr^f 1 ^;^)), (83) 



^(x) x, phf(w)) = j— -5 + ; — -* 

v ; det Bi det R T 



recalling that Mki,i = (-Sj)zfc 1 det Bi and defining analogously to Bi the matrix 
Z\i(x) with elements Aki-i(x). The freedom to linearly rearrange orbitals within 
the Slater determinants det{|x^>} = det{|5j;>} (for each data point i, anal- 
ogously for det{| <fk>}), makes it possible to diagonalize the matrix of overlaps 
< l Pk \ ^i,i> i n new orbitals \xn>, which are then linear combinations of the 

\xu> MM. 



4.3 Numerical example of an inverse Hartree-Fock calculation 

To test the numerical implementation of an inverse Hartree-Fock approach, we 
study a two-body problem, defined by the Hamiltonian 

# = -7^ + ^i + VW. (84) 

Herein we assume the local one-body potential Vi(x,x') = 5{x — x')v\[x) to be 
given and the two-body potential V to be unknown, but local as in Eq. (|65|). 
Hence, our aim is to approximate the function v(\x — x'\), defining the matrix 
elements of V, by using empirical data in combination with appropriate a priori 
information. 

Fig. H shows the results of a corresponding inverse Hartree-Fock calculation. 
(The prior process and parameters are given in the figure caption, computational 
details will be presented elsewhere). For this two-body problem it is possible to 
calculate the exact solution and corresponding likelihood numerically. Hence, we 
was able to sample training data using the exact likelihood. Note that, besides 
the problem of simulating realistic data, an inverse Hartree-Fock calculation for 
more than two particles is not much more complex than for two particles. It only 
requires to add one single particle orbital for every additional particle. Thus, an 
analogous inverse Hartree-Fock calculation is clearly computationally feasible for 
many-body systems with three or more particles. 

We have already discussed in previous sections that, in regions where the 
potential is large, the reconstruction of a potential is essentially based on a priori 
information. Training data are less important in such regions, because finding a 
particle there is very unlikely. In Fig. |9| for example, a priori information is thus 
especially important for large distances. A new, similar phenomenon occurs now 
when dealing with fermions: The antisymmetry, we have to require for fermions, 
forbids different particles to be at the same location. Hence, antisymmetry reduces 
the number of training data for small distances, and a priori information becomes 
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relative likelihoods two-body potentials 




inter-particle distance inter-particle distance 



Figure 9. Inverse Hartree-Fock Approximation: The exact two-body likelihood has been cal- 
culated for two one-dimensional particles with many-body Hamiltonian H, given in Eq. (^4j), a 
local one-body potential Vi(x,x') = S(x — x')a(x/10) 2 , a = 10 -3 breaking translational symme- 
try, mass m — 10 -3 , and a given local two-body potential Kmc of form (p5|) with «true(|a: — x'\) 
= 6/(1 + e - 2 T( a; - fc / 2 )/ fc ) i b = 100, 7 = 10, k = 21, (thin line on the r.h.s.). As training data 100 
pairs {(xi,i, £1,2) 1 1 < i < 100} have been sampled according to that exact likelihood. The corre- 
sponding exact (thin line) and empirical (bars) likelihoods for inter-particle distances \xi,i — Xifi\ 
are shown on the l.h.s. of the figure. To reconstruct the potential a Gaussian prior has been used 
with AKo = A(I — A)/2, A = 0.5 10 -3 , and reference potential (dashed on r.h.s) Vo(\x — x'\) = 
b/(l + e~ 2 ' Yix ~ k/2)/k ), b = 100, 7 = 1, v (0) = 0. The related reference likelihood of inter-particle 
distances is shown on the l.h.s. (dashed). The reconstructed potential v has been obtained by 
iterating with A = Ko according to Eq. (^) and solving Eqs. ( (7l| ) and ( g^ ) within each iteration 
step. The problem has been studied at zero temperature, v fulfilling the boundary conditions 
v(0) — and 11 = constant beyond the right boundary. No energy penalty term Eu had to be 
included. Note, that the number of data is not only small for large inter-particle distances where 
the potential is large, but also for small distances. This effect is due to antisymmetry which does 
not allow particles to be at the same place. Thus, the reconstructed potential v is nearly equal 
to the reference potential vo for large and for small distances. 

especially important. This effect can clearly be seen in the figure, where the 
reconstructed potential v is influenced by the data mainly for medium distances. 
For large, but also for small inter-particle distances, the reconstructed potential 
is quite similar to the reference potential. 

Summarizing, we note that for inverse Hartree-Fock problems in addition to 
the direct Hartree-Fock Eq. (pfTj) a second equation (^|) has to be solved de- 
termining the change of Hartree-Fock orbitals under a change of the potential. 
Despite this complication it was possible to solve the inverse Hartree-Fock equa- 
tions numerically for the example problem considered in this section. 

5 Conclusions 

We have studied the inverse problem of reconstructing a quantum mechanical 
potential from empirical measurements. The approach presented in this paper is 
based on Bayesian statistics which has already been applied successfully to many 
empirical learning problems. For quantum mechanical systems, empirical data 
enter the formalism through the likelihood function as defined by the axioms of 
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quantum mechanics. Additional a priori information is implemented in form of 
stochastic processes. The reconstructed potential is then found by maximizing 
the Bayesian posterior density. 

The specific advantage of this new, nonparametric Bayesian approach to in- 
verse quantum theory is the possibility to combine heterogeneous data, resulting 
from arbitrary quantum mechanical measurements, with a flexible and explicit 
implementation of a priori information. 

Two numerical examples — the reconstruction of an approximately periodic 
potential and of a strictly symmetric potential — have demonstrated the compu- 
tational feasibility of the Bayesian approach for one-dimensional systems. While 
a direct numerical solution is thus possible for one-dimensional problems, it be- 
comes computationally demanding for two- or three dimensional problems. 

As a possible approximation scheme for many-body systems an inverse 
Hartree-Fock approach has been proposed. An implementation of a correspond- 
ing reconstruction algorithm has been tested for a system of fermions, for which 
we were able to solve the inverse Hartree-Fock equation numerically. 

Finally, we want to emphasize the flexibility of the Bayesian approach which 
can be easily adapted to a variety of different empirical learning situations. This 
includes, as we have seen, inverse problems in quantum theory at zero and at 
finite temperature, for single particles as well as for few- or many-body systems. 
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